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1. Abstract 

Let X t and Y t be two Markov chains, on state spaces Q C f2. In this paper, we 
discuss how to prove bounds on the spectrum of X t based on bounds on the spectrum 
of Y t . This generalizes work of Diaconis, Saloff-Coste, Yuen and others on comparison 
of chains in the case Q = Q. The main tool is the extension of functions from the 
smaller space to the larger, which allows comparison of the entire spectrum of the two 
chains. The theory is used to give quick analyses of several chains without symmetry. 
The main application is to a 'random transposition' walk on derangements. 

2. Introduction 

One major tool in the theory of finite state Markov chain has been the comparison 
technique introduced by Diaconis and Saloff-Coste in the papers U and |5|. This 
theory allows users to analyze the mixing of a Markov chain in terms of the mixing 
properties of another Markov chain with the same state space, as long as their sta- 
tionary distributions are not too different. Practically, this may be useful because 
a chain of interest can be related to a similar but much simpler or more symmetric 
chain. In many natural examples, however, one expects Markov chains with different 
state spaces to have similar behaviour. For example, we might expect that removing a 
small number of vertices at random from a graph would generally have a small impact 
on the spectral gap of the associated Markov chains. The bounds in [i] and [5] do not 
apply in this situation. This paper is based on one way to close this gap in the litera- 
ture, and we demonstrate the usefulness of this approach by deriving new bounds for 
several natural chains. See |10| for a useful survey on different ways to apply existing 
comparison techniques in different contexts. Essentially all of the techniques in that 
paper apply in the context of different state spaces. 

Sections 3 and 4 paper deals with the theory of comparisons for distinct finite state 
spaces. These bounds are closely related to those found in E] and pi. Although the 
notation is quite different, they are also closely related to ideas found in the papers [8] 
and [9] . Those papers compared random walks on products of groups to some slightly 
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restricted versions, and in particular had the first special examples of comparison of 
Markov chains with different state spaces. To our knowledge, the only other example 
is Raymer's thesis |21) . 

Sections 5 and 6 apply the bounds obtained in the first part. We begin by look- 
ing at random walks on graphs with 'some vertices removed' that are analogous to 
the random walks on graphs with 'some edges removed' studied in The main 
application in this paper deals with a random walk on derangements, obtained by 
comparison with a similar random walk on permutations. This is a simple example 
of a Markov chain on permutations with restrictions, part of a class first studied 



3|. Closely related chains have been studied with very 
[I]; the same chain was studied in 
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for statistical applications in 
different spectral methods in 

Sections 7-9 of this paper describe some closely related comparison bounds. This 
begins in section 7 with an analogous bound for discrete-time Markov chains on 
continuous state spaces, following the work of 24 . In section 8, we discuss the 
removal of some technical conditions, such as laziness. Finally, in section 9, we extend 
the results to another technique, the spectral profile described in [II]. We then use 

bounds 
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this to sharpen our mixing bound for an earlier example. As proved in 
obtained from the spectral profile are almost right for all chains on finite state spaces. 
Although we don't derive these comparisons explicitly, the discussion in section 9 
applies with few changes to many other comparison inequalities based on functional 
analysis. An excellent survey of such bounds can be found in 20 . 



3. Notation, Background and Statement of Results 

To begin, we consider a §Tazy, ergodic, irreducable, reversible Markov chain on 
state space f2, with transition kernel K and stationary distribution ji (see section [7] 
for remarks on obtaining related results under relaxed assumptions) . We will com- 
pare this to another |-lazy, ergodic, irreducable, reversible Markov chain on state 

space f2 C O, with transition kernel Q and stationary distribution v. We will be- 
gin by comparing Dirichlet forms and Log-Sobolev constants for these two chains. 
Throughout, we will assume that we have satisfactory information about the chain K 
on the larger space, and use this to find bounds for the chain Q on the smaller space. 

For a general chain on space X with kernel P and stationary distribution n, and 
functions / on X, we define the following functions: 
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(1) vm) = \ £ l/(*)-/(v)IM*Mv) 

x,y£X 

(2) £pUJ) = \ E I/(z)-/(v)M*,vMs) 

(3) Ar(/) = £ |/(x)| 2 log 



(4) ll/llk = £l/tolM*) 



These quantities will be used to describe the spectral gap and log-Sobolev constants 
of the associated Markov chains. Recall, if P is a reversible, ergodic, irreducible, |- 
lazy kernel, it has \X\ real eigenvalues satisfying 

1 = A,(P) > > • • • > P\x\-i{P) > 

By the variational characterization of eigenvalues, the spectral gap satisfies 

(5) '-«^SW 
As in [9], the log-Sobolev constant can similarly be characterized by 



< 6 » 

Our general approach, when possible, is to use the following theorem (see Theorem 
2.2 of [§): 

Theorem 1 (Mixing Time Bound via Spectral Gap and Log-Sobolev Constant). For 

a \-lazy reversible Markov chain X t started at X = x, and for t > 1 + iz^jpj + 

telP)lOglog(^y), 

||£(X t )-7r|| <2e" c 

When the log-Sobolev constant a(P) is available, this is often better than the usual 
bound in terms of just the spectral gap (see Theorem 12.3 of [18]), which gives, for 

1 > T^MP) M^y), the bound 

(7) \\C{X t )-Tt\\ <2e~ c 

We will see shortly that, for many examples, it will be easy to find a very reasonable 
bound for a(P) after doing the work needed to bound /3i(P). 
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It is now time to compare the functionals described in equation (pj). For the re- 
mainder of this note, / will denote a function on Q, and / will denote a function on 
Q satisfying f(x) = f(x) for all x G Q. We call such a function an extension of /. 
We note that the inequalities 

W) < 

Lu(f) < C 2 L^f) 
S K (f,f)<C 3 £ Q (fJ) 

together with the variational characterizations of 0i and a given in equations ^ 
and ^ imply the following bounds on (3i(Q) and a(Q) in terms of Pi(K) and a(K): 

1 - h(Q) > -^(1 - h(K)) 
a(Q) > ^aiK) 

Finding a good value for C3 is difficult and the main object of this paper, but 
reasonable bounds on C\ and C2 can be found immediately. The following lemma 
is useful when /1 and v assign similar values to all points in Q, which is the case for 
many natural examples. 



tension of f , and let C = sup^ gQ 44. Then 



Lemma 2 (Comparison of Variance and Log-Sobolev Constants). Let f be any ex- 

V v (f) < CV,(f) 
Lu(f) < CL,(f) 



Proof. Define, for c real (respectively real and strictly positive), the following func- 
tionals: 



K(/,c) = ^|/(a:)-c| 2 7r(^) 

L M,c) = £ (|/(^)| 2 log(|/(o;)|) 2 - |/(x)| 2 log(c) - |/(x)| 2 + c) vr 



Recall that V w (f) = inf c6 a c), and it is shown in [13] that L n (f) 
inf ceR)C>0 L 7r (/, c). Thus, we can write 



EXTENSIONS FOR MARKOV CHAINS 



5 



W,c) = £l/W-c|M*) 



<CV„(/,c) 



which implies K(/) = inf c6R K(/,c) < inf ceR V^f, c) = C%(/). An analogous 
calculation shows that L u (f,c) < CL M (/,c), which implies L u (f) < CL^Q). ■ 

As with the extension theory built up from [4] , it is possible to get bounds on the 
entire spectrum of Q, rather than just the second-largest eigenvalue. Unlike that case, 
this will require the extensions to have some structure. In particular, fix a map M 
from R n to R n so that for all / el", Mf e MP is an extension of /. Assume that 
we can show 

(8) £ K (Mf 7 Mf)<C 3 £ Q (fJ) 

for all / G M. n . Next, consider a Hermitian matrix P with real eigenvalues Ai > 
. . . > A n , and define for any subspace W the functions 

L(W) = mm{^^ : feW} 
U(W)=m a x{ { -^^- : feW} 



Then recall from e.g. page 185 of |14] that the eigenvalues of P satisfy 

Ai = max{L(W) : dim(W L ) = i} = mm{U(W) : dim(W) = i + 1} 

This variational characterization, together with inequality ^ and C\ = sup yen ^M. 
gives the bounds 

l-ft(Q)>^r(l-ftW) 
a(Q) > ^<*(K) 
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The main difficulty will be to bound the Dirichlet forms Eq and 8k- We begin 
by restricting our attention to the special class of simple random walks on regular 
graphs, and then write a bound for general finite Markov chains. 

Assume that K is a |-lazy simple random walk on Q, with associated graph G = 
(V,E). That is, the kernel is given by: 

{\ ify = x 
otherwise 

Then let G = (V, E) be a subgraph of G, where V is obtained from V by removing 
m vertices, and E is obtained from E by removing all edges in E adjacent to one of 
the removed edges. Then let Q be a random walk on G described by 



|(2 - i deg(ar)) if y = x 
Q(x,y)={ \ if (x,y) eE 

otherwise 

where deg(x) is the number of vertices in G adjacent to x. Q is the Metropolis- 
Hastings walk associated with base walk K and target distribution uniform on G 
(see 19 for an introduction to the Metropolis-Hastings algorithm). 

To describe the comparison, it will be necessary first to choose a specific extension 
/ of /. For each vertex x G G, fix some probability measure P x [y] on G, requiring 
P x [y] = S x [y] for x E G. This defines a family of extensions by 

(9) f{x) = Y,PMf{y) 

y£G 

Next, for each pair (x,y) G E, fix a joint measure P Xjy [a,b] on G x G satisfying 
P x ,y[o>, b] = P y [b] for all b G G and ^2f,P x , y [a, b] = P x [a] for all a G G. This is a 
coupling of the distributions P x ,P y - 

Next, for each a,b G G with y& Q P XlV [a, b] > 0, it is necessary to define a flow in 
G from a to b. To do so, call a sequence of vertices 7 = {a = t>o, a ,fc, ^1,0,63 • • • 3 v k[-y],a,b = 
b} a pa£/i from a to 6 if (1^,0,6) v i+i,a,b) £ ^ f° r all < z < ^[7]. Then let r a & be the 
collection of all paths from a to b. Call a function F from paths to [0, 1] a /Zou> if 
X] 7 er b — 1 for all a,b G fi. We will often write G aj b for the restriction of F to 
T a j). Finally, for a path 7 G r a fe, we will label its initial and final vertices by 2(7) = a, 
0(7) = b. 

For fixed measures {P x } x( -q, couplings {P x ,y}r x v )^ei ano - ^ ow ^> we obtain the 
following bound on Dirichlet forms: 
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Theorem 3 (Comparison of Dirichlet Forms for Metropolized Simple Random Walk). 

For flows, distributions, and paths as described above, 



^ , — 77 — 777 

S K (fJ)< n ^M Q (f,f) 



where 



A= sup (1 + 2 %]^[7]1>X7)] 

iq ^ E 73(<l,r) y*G 

79(9.'') (x,y)£E,x,y£G 

For general Markov chains K and Q, define a graph G with vertex set VL associated 
with K by creating an edge (x, y) G E if X (x, y) > 0, and a graph G with vertex set 

associated with Q by creating an edge (x,y) E E if Q(x,y) > 0. The same setup 
then gives the following bound: 

Theorem 4 (Comparison of Dirichlet Forms for General Chains). For flows, distri- 
butions and couplings as described above, 



S K (fJ)<AS Q (fJ) 



where 



A= SUp H(n -Wn^ ^7M7]^(7),°(7)M*(7)) 
Q(<7,r)>o W,r)v{q) 

+ 2 £ %]F[ 7 ]E P >(7)M*(7),y)M*(7)) 

+ £ F ilMl] £ P, d ,[*( 7 ),o(7)]A'(x l y)Ai(x)) 

79(9^) (x,y)£E,x,y£G 

We will now describe some applications of Theorem [3] The first is analogous to 
example 2.1 of |5|. Let K be the kernel of the ^-lazy simple random walk on the 
torus G = l? n with edges of the form ((i,j), {i + and ((i, j), + 1)). Then 

let V\,v 2 , . . . ,v m G G be any collection of vertices with the property that no two 
are in the same square {(i,i), {i + l,i), + 1), (i + + 1)} in G. Then let Q be 
the Metropolis-Hastings walk associated with G\{vx,v 2 , ■ ■ ■ ,v m }. The following is a 
general bound on the Dirichlet form of Q: 

Theorem 5 (Comparison for Random Walk on the Torus with Holes). All functions 
f on G\{v\,V2, ■ ■ ■ ,v m } have extensions f to G so that 



£ K (f,f)<6(l-™)£ Q (fJ) 
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Not all bounds are so useful. For example, if the removed vertices are of the form 
{(1, 1), (2, 2), . . . , (n - 1, n - 1)} U {(1, 1 + f ), (2, 2 + f ), . . . , (n - 1, n + 1 - f )}, we 
have: 

Theorem 6 (Comparison for Random Walk on the Torus with Bottleneck). All 
functions f on G\{(1, 1), (1, 1 + f ), (2, 2 + § ), . . . , (n - l,n - l)(n - 1, n + 1 - f )} 
/iai>e extensions f to G so that 

S K (fJ)<8n 2 £ Q (f,f) 

The result is the same upper bound as is given directly by Cheeger's inequality (see 
Theorem 13.14 of [l8]). As discussed immediately after the proof, it seems impossible 
to do any better by comparison to the standard simple random walk on the torus 
using Theorem [3j 

The main example in this paper is an application of Theorem [3] to the problem of 
sampling from derangements. Recall that a permutation o 6 S n is called a derange- 
ment if, for all is [n], a(i) ^ i. We will compare the well-known 'random transposi- 
tion' walk on S n to its restriction to the derangements D n . More precisely, consider 
the Cayley graph G with vertex set V = S n and edge set E given by (x, y) G E if 
and only if y~ 1 x is a transposition. We will compare the ^-lazy transition kernel K 

on G to its Metropolized version Q on the restriction to derangements D n C S n . 

Although sampling from the set of derangements is not hard (it is easy to sample 
from S n and rejection-sampling based on this is fairly efficient), the Markov chain 
is closely related to several more difficult sampling problems. There has been a 
great deal of interest in the problem of sampling from permutations with restrictions, 
beginning with the work of Diaconis, Graham and Holmes in [3]. See also the recent 
work j2], [l], and [15] and the references contained therein for a discussion of other 
examples. Our main result is: 

Theorem 7 (Dirichlet Form Comparison for the Random Transposition Walk on 
Derangements). Fix n > 10. All functions f on D„ have extensions f to S n so that 

£ K (f J) <22(e + l){l + e n )£ Q {f J) 

where \e n \ < — . In the other direction, any function f on D n and any extension f of 
f to S n satisfies 

S K (fJ)>^ Q (f,f) 

We will show that this easily gives the following bound on the mixing time, im- 
proving earlier bounds of O (n 3 log(n)) [15] : 
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Corollary 8 (Mixing Properties of the Random Transposition Walk on Derange- 
ments). The random walk described above has spectral gap satisfying 

i-&(Q) = n 

and log-Sobolev constant 

a(Q) = (— W) 
\n login) / 

By Theorem [TJ there exists some constant a > and function f(C) such that 
lim c _ >0O /(C) = and for t = Cn + an log(n) 2 , \\C{X t ) - 7r|| TV < /(C). 

In section [8j bounds similar to Theorem [4] are developed for discrete-time chains on 
continuous state spaces. The development follows the discrete theory closely, much 
as W. K. Yuen's development of comparison theory on continuous state spaces in [24] 
follows the discrete theory in p\. 

Next, in section [9j we briefly discuss how these extension ideas interact with a 
recent and powerful way of looking at Dirichlet forms, the spectral profile. The main 
results from (XT] will be introduced. They will then be used to prove the following 
improvement of Theorem [5] 

Theorem 9 (Improved Comparison for Random Walk on the Torus with Holes). If 

X t is a Markov chain as described in Theorem^ we have for t = Cn 2 , 

\\C(X t )-u\\ TV <f(C) 

for some function f independant of n and the particular vertices removed, with 
lim c W(L7) = 0. 

In particular, these random walks have a mixing time that is 0(n 2 ). This is a sub- 
stantial improvement on the bound of 0(n 2 log(n)) obtained by a direct application 
of inequality (J7|, and a small improvement on the bound of 0(n 2 log(log(n))) found 
by a careful application of Theorem [TJ 




4. Spectral Gap and Log-Sobolev Estimates 
In this section, we prove Theorem |4j 
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Proof. Assume without loss of generality that no paths contain repeated edges, and 
write 



£ K (fJ) = \ £ \f(x) - f(y)m*,vM*) 



Y / \f(x)-f(y)\ 2 K(x,y)(i(x)+ E \f(x)-f(y)\ 2 K(x,yMx) 



x£Q,y<£Q 



\ E \m-f(y)\ 2 K(x,yMx) 



x,y(£Q 
— R-\ + Ro H — -Ra 

2 2 



The goal is to compare this to S Q (f, f) = \ J2 x , y en 1/0*0 ~ f(y))\ 2 Q(. x >y)v(. x )- We 
begin by looking at R^: 



Ri = E l/(a)-/(v)l 2 *foi/M»0 



= E 



fc[ 7 ]-i 

E f w E - f( v x, v ,i)) 



< E E f h 



fc[ 7 ]-i 

E (/o>*,i/,i+i) - f( v x, v ,i)) 

i=0 

fc[ 7 ]-l 



K(x,y)(i(x) 



K(x,y)/i(x) 



< E E F w^7] E - /WW-^) 

7 e r x, a 



i=0 



And so the coefficient [(/(g) — /(r)) 2 ]/?! of (/(g) — /(r)) 2 in R 1 is at most 



(10) 



K/(g) - f{r)f] Rl < e mmmiio^uid)) 

7 3(g,r) 



The next step is to bound i?2, which depends on the measures P x and flow F, 
though not on the couplings P x ,y Write: 
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R 2 = E \f(x)-f(y)\ 2 K(x,y)^x) 

2 



K{x,y)n(x) 



= £ 

< E 52Py[z](f(x)-f(z)) 2 K(x,yMx) 



where the last inequality is Cauchy-Schwarz. The next step is to write (f(x)-f(z)) 2 
in terms of differences which appear in S. To do so, note that 



(11) 



fc[ 7 ]-i 

u{x) - f{ Z )f = | f m E (/(^.i+i) - 

^er^ i=o 

< E f w E (/(vw) - /K A< )) 

ier x , z y i=o 

fc[ 7 ]-i 

< E F bMi] E (/(wn) - f(v x , z ,i)f 

yer x , z i=o 



where both inequalities are Cauchy-Schwarz. From this bound, the coefficient of 
(f(q) — f(r)) 2 in R 2 is at most 



(12) [(/(g) - f{r)f]R 2 < £ fc[ 7 ]F[ 7 ] E PyWi)\K{i (7), y)/x(i( 7 )) 
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Finally, it is necessary to bound R 3 . Write 

^3= E \m-f(y)\ 2 K(x,y)v( 

x,y£Q\Q 



X 



E 



P*,vhb](f(a) ~ f(b)) 



K(x,y)/i(x) 



a,ben 



K(x,y)n(x) 



= E 

< E E p ^k&](/(«)-/w) 2 ^?/)m 



Using inequality (11) above, this gives 
(13) 



fc( 7 )-i 



i? 3 < E E p *>' 6 ] E F MM7] E (/Kw)-/W) 2 ^.»)M 



i=0 



In particular, the coefficient of (/(<?) — fij)) 2 in this upper bound is 

[(/(g) - /(r)) 2 ]i? 3 < E F WM7] E P ^(7), V)K 

73(q,r) (x,y)£E,x,y(£n 



X) 



Combining inequalities (10), (12) and (13), the coefficient of (f(q) — fir)) 2 in 
Ri + 2R 2 + i? 3 is bounded by 



[(f(q)-f(r)) 2 ](R 1 + 2R 2 + R 3 )< £ F^k^K , o( 7 ))a*(<(7)) 

73(g,r) y^G 

+ E F W fc [7] E ^[^7),o( 7 )]K(x,y) /U ( 

73(9.r) {x,y)£E,x,yt£G 

On the other hand, the coefficient of (f(q) — f(r)) 2 in <^q(/, /) is at least Q(q, r)u(q). 
Thus, setting 



x 
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A= sup n(« w ^ E F[lUl\K{Kl),o{i))v{i{l)) 

Q( q ,r)>oQ[q,r)V{q) 
+ 2 £ kUFYi^PyHlWiii!)^)^)) 

j3(q,r) y^G 

+ J2 F hUl] £ P x , y [i{l),o{l)]K{x,y)ii{x)) 

73(q,r) (x,y)eE,x,y£G 

we have 

S K (fJ)<A£ Q (fJ) 

which completes the proof. 

■ 

Theorem [3] is an immediate corollary. 

5. Simple Examples 

In this section, we present brief proofs of Theorems [5] and [6j which follow quickly 
from Theorem [3] We begin with Theorem [5j 

Proof. First, we define the measures. If x E {v±, v 2, . . . , v m }, let 



PM 



\ if (x,a) E E 



otherwise 

By assumption, no two vertices in {v 1, t>2, . . . , v m } are adjacent, so there are no 
choices to make when defining the couplings P x , y - To define the flow, note that 
Px,y[ a i b] > only in three situations. We describe each situation up to swapping 
coordinates and reflections about rows and columns: 



Case 1: (x,y) = (a,b). In this case, (a, b) E E, so define the flow to be concen- 
trated on that single edge. 

Case 2: x = E G, y of the form (z + 1, j) ^ G, a = x, and b of the 

form (i + l,j + 1). In this case, define the flow to be concentrated on the path 

Case 3: x = (i,j) E G, y of the form (i + l,j) £ G, a = x, and b of the form 
(i + 2,j). In this case, there are two length 4 paths between a and b, of the form 
{{{ij), + 1)), {{i, j + 1), (i + 1, j + 1)), {{i + 1, j + 1), (i + 2, j + 1)), ((i + 2, j + 
l),(i + 2,j))}&nd{((i,j),(ij-l)),((i,j-l),(i + l,j-l)),((i + l,j-l),(i + 2,j- 
1)), ((i + 2, j — 1), (i + 2, j))}. The flow should put equal weight on both. 
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Then, note that any edge can be in at most 1 path associated with case 1, 2 paths 
associated with case 2, and 4 paths associated with case 3. Thus, A < 2=^(1 + 
(2)(2)i + (4)(4U)=6(l-S). 



As mentioned in the introduction, this bound translates immediately into an 
0(n 2 log(n)) bound on the Total Variation mixing time, using inequality ([7]). Un- 
fortunately, although Theorem [5] can be used to get very good control on the entire 
spectrum of the associated walk as per the comments immediately preceding Lemma 
|2j the lack of symmetry in the problem makes it difficult to use the smaller eigenvalues 
to actually improve our estimate of the mixing time. In section [9j we will avoid this 
problem and find the right bound up to the coefficient of the leading term using the 
spectral profile. 

The proof of Theorem [6] is similar: 

Proof. We begin by defining the measures. For x G {v\, v 2 , ■ ■ ■ , v m }, define 

if (x, a) G E 



PM = < i 



otherwise 



By assumption, no two vertices in {vi, t> 2 , . . . , v m } are adjacent, so it isn't necessary 
to define any couplings. To define the flow, put the entire weight on one minimal 
length path. Since the number of pairs (a, b) with a and y not adjacent but P x , y [a,, b] > 
for some x, y is at most 2n, and the maximal path length is clearly at most An, we 
can write A < 8n 2 . ■ 

More importantly, it seems impossible to substantially improve this bound with 
another comparison to simple random walk on the torus. The missing vertices ef- 
fectively divide the torus into two regions. There must be at least Q(n) paths going 
between the two regions, and the median path length must also be at least Q(n). 
Since 0(1) edges between the two regions exist, any path argument gives A = Q(n 2 ). 

6. The Random Transposition Walk on Derangements 

This section contains the proofs of Theorem [7] and Corollary [8j The proof is based 
on an application of Theorem [3j and the strategy is quite simple. Say that r is 
an extension of a if every cycle of a is contained in some cycle of r, when cycles 
are viewed as subsets of [n]. Roughly, for x G S n \D n , we will define measures P x 
supported on D n which are fairly uniform on derangements that are extensions of 
x. We will then find a coupling P x>y [a, r] of P x [a] and P y [r] so that if P XjV [a,r] > 0, 
there will be a sequence of derangements of length at most 4, starting with a and 
ending with r, where adjacent derangements differ by a single transposition. Finally, 
the flows will be supported on these minimal-length paths. To complete the proof, 
we will describe for any fixed pair q, r of derangements all pairs x, y and all pairs a, t 
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so that the edge (q, r) is in a path from a to r with P x ,y[<T, r] > 0. Most of the details 
of the proof consist of examining a relatively large number of simple cases. For this 
reason, we omit some details for cases very similar to those already discussed and 
instead have only references to the first time the calculation is done. 

We begin by setting some notation. Let S n and D n be the collection of permutations 
and derangements on [n], respectively. For x G S n , define Fix(x) = {i G [n] : x[i] = 
i} to be the fixed points of x. We will multiply permutations from left to right, so 
that e.g. (1,3)(1,2) = (1,3,2). Finally, for a G S n and any subset S of n which is 
exactly the union of cycles of a, we will denote by cr\ s the restriction of a to S. For 
example, (125) (34) | {i,2,5> = (125). This is often useful for writing down explicit paths 
along which most such restrictions don't change. 

The next step is to describe the measures P x for x G S n \D n , their couplings P XjV 
for x, y G S n \D n , and flows G a ^ T for pairs a, r G D n with P x ,y[c, t] > for some pair 
6 S n . We begin with the measures P x [a}. Let Fix(x) = F = (fi, . . . , f m ), and 
write the remaining cycles of x as x = C\C 2 ■ ■ ■ Ck, where Ci = (pi,i,Pi,2, ■ ■ ■ ,Pi,e(i)), 
with £(i) > 2. 

We now write several measures associated with x 7^ id. Fix z G S m , and construct 

(z) 

a distributed according to P x as follows. For 1 < i < m, let Oj be chosen uniformly 
in [n}\Uf =t {f z{j) }. Then write 

(14) er = x(ai, f z (i))(a 2 , f z ( 2 )) ■ ■ ■ (a m , f z (m)) 

Note that the i'th transposition (aj, f z {i)) is the first time that f z ^) appears in the 

sequence (x Yli=i( a iy /z(i)))^=0) an< l so the j'th term in that sequence is obtained from 
the j — l'st by adding f K ^ to the cycle containing a z (jy In particular, no cycles 

(z) 

are split during this iterative construction. This defines a measure P x concentrated 
on D n . It is worth noting that these measures aren't very uniform. For example, 
if Fix(x) = {a, b} and P x [cr] > 0, then (a, b) is not in the cycle decomposition of 
a; if \Fix(x)\ = n — 2, P x is concentrated on n-cycles. We're willing to give up 
some uniformity to gain the following lemma, which is very useful for constructing 
couplings: 

Lemma 10 (Order Indifference). For x G S n \D n with Fix(x) = {/1, / 2 , . . . , f m } 
[n], Ac D n , and z,z' G S m , we have P { x z) [A] = P^ ] [A}. 

Observe that, under any ordering, Pi^cr] G {0, ^"""^l^ }, since each obtainable 

(z) 

element can be obtained in a unique way. Next, observe that the supports of P x 
and P x are the same. In cycle notation, they consist of exactly the derangements 
that can be obtained by slotting the elements of Fix{x) into the non-trivial cycles of 
x. □ 
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Using this lemma, define P x to be the single measure P x z ^ for some (any) ordering 
z. If x = id, then let P^ be uniform on n-cycles. These distributions are biased in 
the sense discussed immediately before the lemma. However, we'll see that they are 
'most biased' for permutations with a large number of fixed points, and there aren't 
enough of those to be significant. 

Having described the measures P x , it is necessary to find couplings P x>y for x,y 
adjacent. The goal will be to ensure that if P x , y [c, t] > 0, then the distance between 
a and r is small as measured by the graph metric on D n induced by the kernel Q. 
Note that, since x and y are only a transposition away, we can assume without loss 
of generality that Fix(x) C Fix(y). 

This observations gives an easy way to construct a coupling, as long 7^ id. 

Under the representation of P x given by equation (14), we order Fix(x), then order 
the elements of Fix(y) to put the elements in Fix(y)\Fix(x) at the front, and the 
remaining elements in the same order as given in Fix(x). Then let {a^fjf^ be the 
random variables used to build r from P y in representation (14). Construct a from 
P x using the same choices for a% in representation equation (14) for all % > \Fix(y) \ — 
\Fix(x)\. This defines the coupling for x, y ^ id. For y = id and x = we can 

observe that P^ = Puj), so we choose the obvious coupling Pid,(i,j)[v, r] = Pid(cr)l a=T . 

The next step is to define flows between all pairs a, r G D n such that P x , y \p, r] > 
for at least some pair x, y G S n . We will often use the shorthand "assign weight (3 to 
path 7CTiT " for "set G a>T [y atT ] = 



Case 1: x,y G D n . In this case, P Xty [a,r] = S Xty [a,r]. There is an edge between a 
and r, and so we assign weight 1 to that length-1 path. 

Case 2: x G D n , \Fix(y)\ = 1. Assume without loss of generality that Fix(y) = i. 
Thus, x = y(i,j) for some j. In this case, P x , y [c, r\ > if and only if a — x — y(i,j) 
and r = y(i, k) for some k ^ j. Thus, we need to create a flow from y(i,j) to y(i, k) 
for all permutations y with unique fixed point i and all distinct j, k ^ i. For parity 
reasons, it is clear that there are no paths of length 1. Let Cj and be the cycles 
containing j and k respectively in y. By assumption, these have size at least 2, so 
write Cj = (j, ax, ■ ■ ■ , a>e(j)) and Ck = (k,bx, . . . , b^)). There are three subcases to 
consider: 

Case 2A: Assume first that Cj ^ Ck- 

Then {(y(i,j),y(i,j)(j,k)),(y(i,j)(j,k),y(i,j)(j,k)(i,j) = y(i,k))} is a length-two 
path between y(i,j) and y(i,k), and all vertices are clearly in D n . Assign weight 1 
to this path. 

Case 2B: Cj = Cf., with \Cj\ > 2. In this case, write C = Cj = C^ = 
(j, ax, ■ ■ ■ , a>t(j), k, bx, ■ ■ ■ , b^))- If > 0, the path described in case 2A remains 
in D n , and so we assign weight 1 to this path. If £(k) > 0, an analogous path with 
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fs and fc's switched and arrows reversed will remain in D n . Since \Cj\ > 2, we have 
either £(j) > or £{k) > 0. Thus, the only remaining case is: 

Case 2C: Cj = Ck = (j, k). In this case, we will use the assumption that n > 5. 
Choose h G [n]/ {i, j, k}, and write Ch = (h, ci, . . . , Q^)) for some > 1. Also 
write S = k, h, c±, . . . , We calculate: 

= (hj, k)(h,d,. . . ,c e(h) ) 

y(hj)(hh)\s = (i,j,k,h,d,...,c t (h)) 
y(hj)(i,h)(j, h)\ s = (j, k)(h,c u ...,c e{h) ) 
y(hj)(hh)(j, h)(k,h)\ s = (i,k,j,h,d,.. -,c m ) 
y(hj)(i,h)(j, h)(k,h)(i,h)\ s = r 

The five permutations described above, when restricted to S c , are all equal. 
These five permutations, without restriction to S, are all in D n , and so for each 
h G [n]/{i,j, k} this sequence defines a length-4 path from a to r. In this case, we 
put weight ^3 on each of these paths from a to r. 

Case 3: x G D n , \Fix(y)\ = 2. Without loss of generality, write Fix(y) = 
{a, b}, so that x = y(a,b). Note also that since x G D n , P x [cr] = 8 x [a]. Write 
y in cycle notation as y = (p 1A , . . . ,pi,i(i))(p 2 ,i, • • • ,P2,e(2)) ■ ■ ■ (Pk,i, ■ ■ ■ ,Pk,£(k))(a)(b), 
where £(i) is the length of the i'th longest cycle, with ties broken lexicographically 
by smallest element. If P y [r] > 0, we can write r = y(a,Pi^j^)(b,Pi^),j(b)) or 
r = y(a,Pi( a ),j(a))(b,a), where in both cases 1 < i(a),i(b) < k, 1 < j(a) < £(i(a)), 
1 < j(b) < £(i(b)). This leads to three types of paths. 

Case 3A: r = y(a,p^ a )j^ a ))(b,Pi^),j{b)) with i(a) ^ i(b). We define two paths from 
t to a are as follows. In both paths, the first derangement is r. The second is given by 
either T(&,Pi(o)j(<»)) or T (, a iPi(b),j(b))- The symmetry between these two first steps being 
clear, we continue describing only the path beginning (r, r(&,pj( a w( a )), . . .). Note that 
the cycle structure of T(b,Pi( a ),j(a)) is given by the cycle structure of r with the two 

Cycles (Pi(a),l, ■ ■ ■ ,Pi(a),j(a)-l, a,Pi(a),j(a), ■ ■ ■ , Pi{a),l(i(a))) and 

(Pi(b),i, ■ ■ -,Pi(b),j(b)-i, b,Pi( b ),j(b), ■ ■ -,Pi(b),i(i(b))) merged into the single cycle 

(Pi(a),j(a),Pi(a),j(a)+l, ■ ■ ■ , Pi(a) ,j (a) -1 , &,Pi(b)J(b), • • • , Pi(b),j(b)-l) 

In particular, it is still a derange- 

ment. The next step on this path is 7"(&,Pi(a)j(o))(o,Pi(6)j(b))- The cycle structure 
of this permutation is obtained from that of T(&,Pz(a),j(o)) by splitting the large cycle 
{Pi(a),j(a),Pi(a),j(a)+i, ■ ■ ■ , Pi( a ),j(a)-i, a>, b, Pi(p),j{b) , ■ ■ ■ , Pi(b),j(b)-i) into the smaller cycles 
(a, b) and (pi( a ),j( a ),Pi(a),j{a)+i, ■ ■ ■ ,Pi(a),j(a)-i,Pi{b),j(b), ■ ■ ■ ,Pi(b),j(b)-i)- Again, this is a 
derangement. The final step is multiplying by (pi(b),j{b),Pi(a),j(a)) to get to a. We 
assign weight \ to both paths. 
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Case 3B: r = y(a,p i{a)>j(a) )(b,p i{b)J{b) ) with i(a) = i(b) and r(a) ^ b, r(b) ^ a. 
We create the following two paths from r to a . The first vertex is r. The sec- 
ond vertex is r(a,b), which has the same cycle structure as r, with the long cycle 
(Pi(a),j(a)-i, a,Pi(a),j(a),- ■ ■ , Pi( a ),j(b)-i, b , Pi( a ) ,j (b) , ■ • •) split into the cycles 
(a,Pi(a),j(a), ■ ■ ■ ,Pi(a),j(b)-i) and 

(piVi(a),j(b)i ■ ■ ■ ,Pi(a),j(a)-i)- By the assumption that a and b were not adjacent in the 
large cycle, both of the small cycles are of size at least 2, so this is a derangement. 
The next vertex should be either T(a,b)(p^ a )j( a ),b) or r(a, fr)(Pi(a),j(ft), a)- As in case 
3A, there is obvious symmetry after relabelling a and b, and we will continue the de- 
scription of the first of these paths. Note that r(a, &)(pi( a ),j(a), b) obtained from r(a, b) 
by merging the cycles (a,p i(a)J{a) , . . . ,P;( a )j(&-i)) and (b,p i(a)J(b) , . . . ,Pi( a ),j(a)-i) into 
the single cycle (a, &,Pi( a )j(6), • • • ,Pi(a),j(a)-i,Pi(a),j(a), ■ ■ ■ ,Pi(a),j(b-i))- This is clearly a 
derangement. Finally send r(a, fc)(pi( a ),j(a), b) to a by multiplying by the transposi- 
tion (a,pj( a )j( 6 )). The path with the other middle edge is analogous; we assign weight 
| to both paths. 

Case 3C: This covers the cases o = y(a,pi( a )j( a )){b,a) and r = 
y{ a iPi{a),j{a)){b,Pi(b),m) with i(a) = i(b) and either r(a) = b or r(6) = a. In this 
case, a is adjacent to r, and in particular a = r(pj( a )j( a ), 6). The flow should put all 
weight on this length- 1 path. 

The next step is to look at the cases where x, y G S n \D n . There will be 3 
cases, depending on whether \Fix(x)\ = \Fix(y)\, \Fix(x)\ = \Fix(y)\ — 1, or 
\Fix(x)\ = \Fix(y) \ — 2. These will turn out to be very similar to cases 1 through 3 
above, with slightly more complicated notation. In particular, all paths will again be 
of length at most 4. 

Case 4: \Fix(x)\ = \Fix(y)\. In this case, we can assume without loss of generality 
that y has the same cycle structure as x with the i'th cycle, (p^i, ■ ■ ■ ,Pi/(i)), split into 
the two cycles (p itl , . . . ,p i>a ) and (p i>a +i, ■ ■ -Pifyi))- Let Fix(x) = Fix(y) = F — 
{fi, ■ ■ ■ , fm}- If Px,y\?i r ] > 0, then (7 and r have the same cycle structure, except 
that the single cycle including p i;1 , . . . ,Pi/(i) in that order in a is split into two cycles 
in t. One contains p^i, . . . , pi^ a in that order and the other contains Pi t a+i, ■ ■ ■ ,Pi,e(i) 
in that order. Both will have some elements of F interspersed between the elements 
of the form p^ qi but these interspersed elements will also always be in the same order 
in a and r. 
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In particular, for some < a < m, some z G S m and (f) : [a] — > £{i) we can write 
S = {p ijU . . . ,Pi,£(i)} U {f z (i), f z (a)} and 

a 

<r\s = xY[(Pi,<t>(k), fz(k)) 
fc=i 

a 

r\s = x{Pi,liPi,a+\) \\(jPi,<t>{k), fz(k)) 
k=l 



It is easy to check that a and r are adjacent, and in fact a = r(r[p ija ], rfpi^j)]). 
Assign weight 1 to this length- 1 path. 

Case 5: \Fix(x)\ = \Fix(y) \ — 1. Assume without loss of generality that Fix(x) = 
{fi, fm} and Fix(y) = {i, fi, . . . , f m }. Therefore, x = y{i,j) for some j ^ i, and 
Px,y\?-, t] > if and only if a, r can be written in the form 



m 

(15) a = y(i,j) Y[(a s J s 

s=l 
m 

(16) r = y(i, k) JJ(a s , f s 



s=l 



with a s G [n}\ Ut> s {ft}- We will construct paths very similar to those in Case 2. For 
a G [n], let F(a) = (/i(a), . . . , /£/( Q )(a)) subset of F with the property that / S GF 
if a s = a or a s G F, using the representation (15). We order F(a) by the indices, so 
that if fi, fj G F(a) with i < j, then fi is before /j in the list F(a). When F(a) is 
empty, define fi(a) = a. 

Next, say that a directed edge (t],k) in SV,, is defined by the transposition (q,r) if 
?7 _1 /t = (q,r). We will be changing paths by changing the transpositions that define 
their edges. In general, if j aiT = (a, cr(qi, ri), . . . , a n^ife; t«) = t) is a path from a 

to r, we say that ja'^' — (^'j > r 'i); • • • > °"' n^=i(9i; r D = r ') i s the path from a' to 
t' obtained by replacing all edges defined by (ft, r^) to edges defined by (ft 7 ,?^). To 
define the flows in case 5, we will take paths from case 2 and replace all edges defined 
by transpositions (q, r) with edges defined by transposition (fi(q), /i (?"))• 

We will say this more carefully for the analogue to case 2A. Assume j, k aren't in 
the same cycle in y, and let 5* be the union of all elements in cycles containing i, j or 
k in a or r. Then 
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a\ s = (F(i),i, F(j),j, b u ... , b i{j) )(F(k), k,c u ..., c m ) 
T \s = (F(i),i, F(k), fc, ci, . . . , c m )(F(j),j, b u ... , b e{j) ) 

And so we write the path 

{(*MfiU),Mk))), Thi s replaces the 

analogous path {(a,a(j,k)),(a(j,k),a(j,k)(i,j))} from case 2A. As in case 2B, we 
will use the same path if j, k are in the same cycle in y and k) a, r. 

If (i,j,k) G a, t, the same discussion as in case 2C shows that the path which 
goes through a, 

v(fi(i), fi(h)), a(fi(i), fi(h))(h(j), h(h)), aihii), fi(h))(fi(k), fi(h)) 

and finally <r(/i(i), h{h)){h{j), h{h)){h{k), = r remains in L> n . 

Case 6: = |Fia;(?/)| — 2. Just as case 5 is very similar to case 2, case 6 is 

very similar to case 3. Assume Fix{y) = {f±, . . . , f m } and Fix(x) = {c, d, fi, . . . , f m }. 
In particular, y = x(c, d). Write y in cycle notation, as in case 0c. Then if P Xty [a, r] > 
0, analogously to case 3, we can write the pair (a, r) in one of the following three 
ways: 
Case 6A: 



T = yY[(a s ,f s ) 

s=l 
m 

<r = y(c,Pi(c),j(c)){d,Pi( d ),j(d)) Y[(a s , f s ) 

s=l 



with i(c) 7^ i(d), 
Case 6B: 



r = yY[( a s,fs) 

s=l 
m 

o- = y(c,Pi(c),j{c))(d,Pi( d ),j(d)) Y[{a s , f s ) 

s=l 



with i(c) = i(d), or 
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Case 6C: 

m 

r = yY[(a s ,f s ) 

s=l 
m 

<? = y(c,Pi(c),j(c))(d, c) Y[(a s ,f s ) 

s=l 

where in each case a s G M\U t > s {ft}- These three possibilities correspond exactly to 
those in cases 3A, 3B and 3C respectively. Just as in case 5, we define flows by taking 
the paths in cases 3A, 3B and 3C and substituting an edge defined by transposition 
(fi(q),fi(r)) for any edge defined by transposition (g, r) in case 3. 



Having defined the measures, couplings, and flows, we will now bound the com- 
parison constant A. We will do this by bounding separately the edges that appear 
in paths between a and r with P x ^[a,T\ > and x, y satisfying the conditions from 
cases 1 through 6 above. To be more precise, for T G {1, 2, . . . , 6} and x, y G S n , say 
that (x,y) G T if x~ 1 y is a transposition and x, y satisfies the conditions of case T 
above. We then write: 

A= sup (1 + 2 k [lxA G xAlx,X^ P v\ z \ 

+ Y G *AiaA k haA Y p *,v\ a A) 

< 1 + SUp Y Y k[lx,z\ G x,zhx,z] Y 

{q ' r) T l x ,zB(q,r),(x,z)eF y^G 

+ ^Y Yl G aAla,b\ k haA Y P x,y[ a A) 

(9 ' r) F la,bB(q,r) (x,y)€E,x,yt£G,(x,y)eT 



We will then separately bound the weights associated with each of the 6 cases in this 
sum. In principle, this part of the argument is the same as the weight-counting at 
the end of the proof of Theorem 5; it is only the larger number of terms in each case 
that makes it more complicated. 

Case 1: (g,r) G 7 CT)T , where G ajT [y atT \ > and P x>y [a,T] > for some x,y G D n . 
This implies that in fact (g, r) = (x, y), and so the total weight in this case is exactly 
1. 
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Case 2: (q,r) G 7 CT)T , where G CT)T [7^] > and P X:y [a,r] > for some x G D n , 
\Fix(y)\ = 1. Assume Fix(y) = i. Then x = y(i,j) for some j, and we can write a = 
x = y(i,j), t = y(i, k). If x, y correspond to case 2A or 2B above, the path between a 
and r passes through the permutations {y(i,j),y(i,j)(j,k),y(i,j)(j,k)(i,j)}. Thus, 
either (q,r) = (y(i,j),y(i,j)(j,k)) or (q,r) = (y(i,j)(j,k),y(i,j)(j,k)(i,j)). 

First, assume (q,r) = (y(i,j),y(i,j)(j,k)). Then q(i,j) and r(j,k)(i,j) have a 
fixed point at %. This means that q[j] = i and r[k] = i. In particular, once q,r and i 
have been fixed, so are j and k. Thus, for fixed q, r there are at most n — 1 choices for 
i, and these choices determine x,y,a and r. Since P x ^[o-,t] G {0, ^r[}, this means 
that the total weight is at most (n — = 1- 

If (q,r) = (y(i,j)(j,k),y(i,j)(j,k)(i,j)), a similar computation gives the same 
conclusion. Thus, the total weight for any given edge (q, r) coming from pairs x, y in 
case 2A or 2B is at most 2. 

If x,y correspond to case 2C, there are four possibilities for the pair (q,r), as de- 
scribed in case 2C. Following the notation in that case, we look at the first possibility, 
(q,r) = (y(i,j),y(i,j)(i,h)). Note that q[i] = j and r[(q(i, = h - In particu- 

lar, once q, r, k and % have been fixed, j = q[i] can be computed from them, and this 
information can be used to compute h = r[(q(i, Thus, for fixed q, r there 

are at most (n — l)(n — 2) choices of distinct i, k, and these choices determine x,y, a 
and r. Since P Xt y[a,r] £ {0, ^-} and the weight of any particular path in case 2C 
is the total weight assigned to any first edge (q,r) by such a path is at most 
^5§. A similar analysis with the same conclusion applies to the other 3 edges of the 
length-4 paths described in case 2C. 

We conclude that the total weight assigned to any edge (q, r) by vertices (x, y) 
covered by case 2 is at most 6— |. 

J n— 3 

Case 3: (q,r) G 7 CTiT , where G UtT ['j (7tT ] > and P Xty [a,r] > for some x G D n , 
\Fix(y)\ = 2. Write in this case Fix(y) = {a,b}, so x = y(a,b). There are three 
possibilities for pairs (a, r) with P x , y [cF, t] > 0, corresponding to cases 3A, 3B and 3C. 
We keep the same cycle notation as in case 3 above and begin by looking at case 3A. 
In this case, the path from r to a has the form 

r ->■ r(b,p i{a)J{a) ) 

->■ r ( b ,Pi(a),j(a))(a,Pi( b ) d (b)) 

-> r ( b iPi(a),j(a))(a,Pi(b),j(b))(Pi(b),j(b),Pi(a),j(a)) = O 

Note that the pair (q, r) can be any of the 3 edges defined by this path. Look 
for now at edges of the form (q,r) = (r, r(6, Pi( a ),j( a )))- I n this case, since 
r = y(a,Pi(a),j(a))(b,Pi(b),j(b)), where y has fixed points at a and b, we note 
that q(b,Pi(b),j(b))(a,Pi{a),j(a)) and r(fe,p i ( a )j( a )))(fe,pi( 6 )j( 6 ))(a,pi( a ) J ( a )) also have fixed 
points at a and 6. In particular, q[a] = Pi( a ),j(a), q[b] = Pi{b),j(b), r [ a ] — b and r[b] = 
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Pi(b),j(b)- Thus, once q,r,a, and b have been fixed, they determine (pi( a ),j(a),Pi(b),j(b),v, 
and r. For fixed q, r, there are at most ^ZLtll choices of (a, b). On the other hand, for 
a given the probability P XjV [a, r] for any pair a, r associated with a, b is at most 
Thus, the total weight assigned to this path is at most „, "^"w . Com- 



(n-l)(n-2)" v»^ i&iilJ ""'6'^ u " llJ " u mu ™ 2(n-l)(n-2) " 

paring the pair (g, r) to the remaining two edges of the path, the same phenomenon 
holds: the fact that y has two fixed points means that the choice of two parameters 
determines the entire path, and so again the weight given to these edges is at most 
2(n-i)(n-2) • Combining these bounds, we see that this case gives a total weight of at 

, o n(n+l) 

most 3 — 



2(n-l)(n-2) " 

Looking at the second case, r = y(a,Pi( a ),j(a))(b,Pi(b),j(b)) with i(a) = i(b), gives 
the same congestion bound of 3 2 (n-i*(n-2) w ^ n essentially the same proof. The third 
case, r = y(a,p^ a )j^)(a,b), is essentially the same as case 1. As in that case, we 
have a and r adjacent and again determined by the choice of (a, b). Thus, the total 
congestion in this case is at most 1 for n > 6. 

Putting these bounds together, the total weight for any given edge (g, r) coming 
from pairs (x,y) in this case is at most 1 + 6 2 (n-i)(n-2) ■ 

Case 4: (g,r) e 7 ov? -, where G aT [^y CT ,r] > and P Xjy [a,r] > for some x,y ^ D n , 
\Fix(x)\ = \Fix(y)\. As noted in the coupling description for case 4, this means 
(q, r) = (a, r). We now determine the total weight given to the pair (a, r) by all pairs 
x, y with \Fix(x)\ = j. 

First, note that for any particular pair x, y with \Fix(x) \ = j, we have P x , y [cr, t] < 
(^Zi); ■ Next, note that there are at most sucn P & i rs x -> V f° r which P x ^y\<y, r] > 0; 

we obtain the bound by noting this is the number of ways to choose the j elements 
of Fix(x). Summing over the size j of Fix(x) = Fix(y), the total weight assigned to 
cr, r is at most 
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^j\(n-j)\(n-l)\ 



J 

n-2 



•sr-^ n 1 

~[ 3 ( n ~ j) 

n 

2 j ri-2 



,)= 

< 1 + 2(e- 1) 

where the last inequality only applies for n sufficiently large that n 2 < |!, e.g. 
n > 10 suffices. Thus, the total weight for any given edge (q, r) coming from pairs 
(x, y) in this case is at most 2e + f . 

Case 5: (q, r) G 7<j iT , where G<t, t [7o-,t] > and P x>y [a,T] > for some x, y £ D n , 
\Fix(y)\ = \Fix(x) \ — 1. The counting of paths for fixed \Fix(y)\ is as in case 2, and 
finding the weights of each path and summing is as case 4. More precisely, as in case 
4, if \Fix(y)\ = j, the weight assigned to (a, r) by P X:V is at most ("zyj]- By the same 
argument as in case 2, the total weight for any edge (q,r) G 7 CT)T is at most 6^5§, 
and there are at most ji^Ljy. P & i rs { x > y) f° r which P XjV [cr, r] > 0. Thus, a calculation 
analogous to the bound on W in case 4 gives a total weight of at most 6(2e + l)*jz§- 
Combining these arguments gives a total weight of at most 6(2e + 1)^E§- 

Case 6: (q, r) G "f a)T , where G ffiT [7 ffiT ] > and P Xty [a,r] > for some x,y ^ _D n , 
= |.Fu;(a;)| + 2. Combining the arguments of 4 and 3 in the same way 
that case 5 combined the arguments of cases 4 and 2 gives a total weight of at most 
(2e + l)(l + 3(^±^). 

Putting together the 6 bounds, and noting that all paths are of length at most 
4, we have A < 2(e + f ) ^2 + 6^5§ + 3 i)ff-2) ) ■ ^his P roves the upper bound in 
Theorem [71 

To prove the lower bound, define for < e < f the distribution n e on S n by 7r e (a) = 
Z for a G D n and 7T e (cr) = Ze for cr G S n \D n , where ^ < Z < is a normalizing 
constant. Then define the kernel Q e to be the Metropolis kernel associated with base 
chain Q and stationary distribution ir e . Let 1 = /3 e ,o > /?ei > • • • > /3 en ! > be the 
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spectrum of Q e . Note that Qi = Q, and Q = K when restricted to D n . By Cauchy's 
interlacing theorem, if we denote by 0i the second-largest eigenvalue of K, we have 

1 - /?o,i < 1 - Pi- 
Next, we compare Q t and Q e > for e > e' > 0. Since Q e [c, r) > if and only if 

Q e /[a,r] > 0, all paths can be made length 1. For this choice of paths, there are 

three types of edges to look at: those between two elements of D n , those between 

an element of D n and an element of S n \D n , and finally those between two elements 

of S n \D n . In all three cases, the congestion and path length is 1. In the first case, 

^y^-^eiz)Q e (z,w) < 2e. In the second case, ne[z) Q e{z ^ e'{z)Q e {z, w) < 2e e - t < 

2e. In the third case, ir ^ z) Q^ ZjW) ^ e '(z)Q e (z,w) < 2e ( J) < 2e. Thus, e Qe (f,f) < 
2ee Ql (/, /) for all e > 0. Taking the limit as e goes to 0, e Qo (f,f) < 2ee Ql (f,f). 
Again, by Cauchy's interlacing theorem, ex(f,f) < 2eeQ 1 (/,/)• This completes the 
proof of Theorem [7} □ 

To prove Corollary [8j we note that 



mm 



W) 



> minl32( 



1 + e 



> 264(1 + e) 



n 



< 3 



Where the first inequality is due to Lemma |2| Theorem 7jand the fact that y L) 
for n > 10, and the second inequality is due to the spectral gap estimate in section 



\s n \ 



9.2 of [22]. 

a{K) = fi( 



The bound on a(Q) is found the same way, and relies on the bound 



i 

n log(n) 



found in 



7. Reversibility and Laziness Assumptions 

Theorem [4] doesn't make use of either the reversibility of ^-laziness assumptions. 
However, in order to obtain Total Variation mixing time bounds, we used Theorem 
[TJ which uses both assumptions. In this section, we briefly discuss the common 
techniques for avoiding these assumptions. To use comparison without reversibility, 



Cheeger's inequality is often used. In particular, section 4 of 10 applies without 
modification to the setting of this paper. 

Avoiding the laziness assumption requires slightly more work, but is more ef- 
fective. For non-lazy chains, the term (3i(P) in Theorem [l] may be replaced by 
max(/3i(P), |/3|x|(-P)|). To estimate |/5|x|(-P)|, define the following analogue to the 
Dirichlet form: 
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Fp(f,f) = ((I + P)f,f) 

= \ £ \m+f(y)\ 2 P(x,y)n(x) 

x,y&X 

By inequality 2.3 of Jsj , if 7 K (f, f) < AFq(/, /) and C = sup yen j^, we have 

Pi(Q)>P\a\(Q) 

> -1 + j(l + P\n\(K)) 

>-l + ^(l + %|W) 

As per the comments immediately following Lemma 2, it is possible to obtain 
bounds on more eigenvalues if there is a more structured relationship between /, /. 

Define paths, flows, extensions, and couplings as in the proof of Theorem [4], with 
the added requirement that flows must be concentrated on paths 7 with |^| odd. For 
a given edge e in path 7, let t e (7) be the number of times that e is traversed in 7. 
In Theorem [4], we could assume without loss of generality that this was at most 1; in 
the present context, we can assume that it is at most 2. Then we have the following 
comparison result: 

Theorem 11 (Comparison of Forms for General Chains). For flows, distributions 
and couplings as described above, 



J r K (fJ)<AJ Q (fJ) 



where 



Q iq ,r)>oQ(q,r)v(q) 7 J^ r) 

lx,z=>{q,r) y£G 

+ £ G a,bha,b\ t (q,r)(l) k [la,b] £ P x,y[ a A K { X ^V)K X )) 

7a,l>3(?>r) (x,y)&E,x,y(£G 
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Proof. Start by writing 



= ^ E l/(*) + /(y)l 2 ^,yM*) + E l/(^) + /(y)l 2 ^,yVW 

x , y en 

+ ^ E l/W + /(y)l 2 ^(^)M*) 



X ,y<£Q 

-Rl + i? 2 + --R3 



The goal is to compare this to T Q {f, f) = \ J2 x , ye n \f( x ) + f(y))\ 2 Q( x > V)v{ x )- We 
begin by looking at R\ (note that the assumption of odd path length occurs on the 
second line): 



Ri= E \f( x ) + f(y)\ 2 K(x,y)fi(x) 



= E 



fc[ 7 ]-l 

E E (-^'(/K^+i) + /M 



i=0 



< E E G *M 



fc[ 7 ]-i 

i=0 

fc[ 7 ]-l 



K(x,y)n{x) 



K(x,y)n(x) 



-EE G *.i/(7)Mt] E (/( u *,w,i+i) + /KJ) 2 ^,!/)/j( 

x,2/6Q 7 er x> j, 



i=0 



And so the coefficient of (/(g) + f(r)) in R 1 is at most 
(17) [(f(q) + f(r)) 2 ]R 1 < E ^( 7 )t (? , r) ( 7 )M 7 ]X(x,y)/.(x) 



The next step is to bound R2, which depends on the measures P x and flows G x>y , 
though not on the couplings P x , v - Write: 
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R2= E \f(x) + f(y)\ 2 K(x,y)Li(x) 

2 

= E E^N(/^) + ^)) K(x,y)n(x) 

< E 52Pv[z](f(x) + f(z)) 2 K(x,yMx) 

fc[ 7 ]-l 

< E E^w E G ^WM7] E (/(v,hi)+/m) 2 ^(^2/)M^) 



where the last inequality is Cauchy-Schwarz. The next step is to write (f(x)+f(z)) 2 
in terms of differences which appear in S. To do so, note that 



fc( 7 )-i 

(fc( 7 )-l 
E (-1)^/(^+1) + /(«.,.,<)) 
i=0 

fc( 7 )-l 

< GxAiMi] E (/(v, + i) + /(v,»)) 2 

7 er x , z i=o 



where both inequalities are just Cauchy-Schwarz. From this bound, the coefficient of 
(/(g) + f{r)) 2 in R 2 is at most 



(19) [(/(g) + f{r)f]R 2 < ]T t^k^G^A E PyWix, y)^x) 

lx,z3{q,r) y<£G 



EXTENSIONS FOR MARKOV CHAINS 

Finally, it is necessary to bound R 3 . Write 
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^3= E \f(x) + f(y)\ 2 K(x,yM 

x,yin£l\Q 



X) 



E 

x,yin£l\Q 



P x , y [a } b}(f(a) + f(b)) 



K(x,y)ii(x) 



K(x,y)fx(x) 



= E 

x,yinQ\Q 

< E E P x >Mf(*) + f(b)) 2 K(x,y)v( 

x,yinn\fl a fi&G 



X) 



Using inequality (18), this gives 



fc( 7 )-l 



i? 3 < E E p *>^ E E (/(^wi + zK^i^^i/iM 



i=0 



In particular, the coefficient of (f(q) + fij)) 2 in this upper bound is 
(20) 

[(f(q) + f(r)) 2 ]R 3 < E G *AlaAhat)(la, b )k[laA E ^K^(^,!/)M^ 

7a,bB(q,r) (x,y)&E,x,y£G 



Combining inequalities (17), (19) and (20), the coefficient of (f(q) + /(?")) in 



Ri + 2i? 2 + -R3 is bounded by 
[(/(?) + /(r)) 2 ](i?i + 2R 2 + R 3 ) < ^ G x>v (>y)t {q , r) (>y)k[>y]K(x,y)n( 



X 



+ 2 t(q,r)(l)k[lx, z }G x ^ x J Y P y [z]K(x, y)fi(x) 

7^,zB{q,r) yi_G 



la,bB(q,r) 



(x,y)£E,x,y(£G 



On the other hand, the coefficient of (f(q)+f{r)) 2 in .Fq(/, /) is at least Q(q, r)v(q). 
Thus, setting 
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A= SUP Cltn r\„(n\ ( S Gx >V WV) M fc W # V )»{x) 

Q( q> r)>oQ{q,r)v{q) ix ^ r) 

~fx,z3(q,r) y^G 

+ G o,6[7a,6]*( 9 ,r)(T)^[7a,b] ^ P ^[a, 

y a ,b3(q,r) (x,y)£E,x,y£G 

this implies 

FK(fJ)<AF Q (f,f) 

which completes the proof. 



8. Comparison for Chains on Continuous State Spaces 

In this section, we write down an analogue to Theorem [4] for Markov chains on 
continuous state spaces, based on Theorem 3.2 of [24]. It is necessary to develop 
some notation and definitions. 

We consider two state spaces S C S, with measurable sets F and F = {A fl S\A e 
F}. Then let K(x,dy) and Q(x,dy) be measurable kernels on (S, F) and (S,F) 
with stationary distributions \x and v. Again, the goal will be to describe the mixing 
properties of Q in terms of the mixing properties of K, using spectral information. 
Although this setup is quite general, and much of the work goes through in greater 
generality, we will assume that S and S are Lebesgue-measurable subsets of M n . We 
will write dx for a reference measure on S, and we will also assume that S has nonzero 
measure under dx. In particular, if S is a submanifold of a manifold S, we allow S to 
have positive codimension in M. n , but don't allow S to have positive codimension in 
S. This zero-codimension assumption cannot be dropped easily; the Markov kernel 
on S will generally give no information about kernels on subsets of measure 0. 

Say that a Kernel P with stationary distribution tt is reversible if 7i(dx)P(x, dy) = 
ir(dy)P(y, dx). Note that if P is reversible, it is a self-adjoint operator on L 2 (tt), and 
so in particular has a real spectrum. Let \q{P) be the infimum of the spectrum of P 
on the orthogonal complement of 1, and let Ai(P) be the supremum of this spectrum. 
As in the discrete case, 

Ai(P) = mf { U '[!~/ )f)n : (/, 1). = 0, / ^ 

\L 2 {tt) 
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Say that a kernel P is a-lazy if we can write P(x,dy) = a5 x (dy) + (1 — a)fi x (dy), 
where 8 X is the measure concentrated at x and [x x {dy) is any measure. If P is |- 
lazy, then A (-P) > 0, and so can essentially be ignored. For e > 0, also define 
Il7a:,s/lle = Yli( u v)ejx y (ku(v) p(u))~ 2e Unlike the discrete case, abound on \%(P) doesn't 
immediately give a bound on the total variation distance. Instead, we have only 

||tP*-7t|| 2 < ||T-7r|| 2 |Ai(P)| B 

It is now possible to set up the main comparison theorem. Assume that K and Q 
are |-lazy, and furthermore that we can write K(x, dy) = \5 X + k x (y)dy, Q(x, dy) = 
2$x + <lx(y)dy, p(dy) = p(y)dy and v(dy) = r(y)dy for the reference measure dy 
on S. As in the discrete theory, the first step is to define for all functions / on S 
their extensions / to S. Define for all x G S\S a measure r x (a)da on S, and set 
f(x) = f s f(a)r x (a)da. It is also necessary to define couplings r x ^ y (a,b)dadb of the 
measures r x (a)da and r y {b)db. 

Finally, it is necessary to define paths. This is slightly more complicated than 
the continuous situation. For fixed x,y G S and kernel K , let a sequence x = 
vq, vi, . . . , Vk = y be called a path from x to y if k Vi (vi + i) > for all < i < k. Say 
that (x,y) require a path if r a b{x,y) > for some pair a, b with u(a)k a (b) > 0, and 
denote by V C S 2 the collection of pairs requiring a path. Then for fixed x, y , let I" X y 
be the collection of paths from x to y, and let G : (x, y) — > ^ xy G T xy be a choice of 
a single element ^ xy G T xy for each pair (x,y) G V. For the fixed ^ xy G T xy , let |7 X J 
be the number of elements in 7, an let Jxyli] be the z'th element. Unlike the discrete 
case, some regularity assumptions are also needed. 

Let V = {(x,y,i) : (x, y) G V, 1 < i < |TxJ}. Say that G satisfies the first 
regularity condition if the map T(x,y, i) = (G(x, y)[i — 1], G(x, y)[i], \G(x, y)\, i) from 
V to S 2 x N 2 is injective. Then, for all b, i G N 2 such that (u, v, b, i) G T(V) for 
some u, v G S 12 , let VI4,i = {(^5^) : ( u i v ,b,i) G T"(u)}. Assume that G satisfies 
the first regularity condition, and define the 1 to 1 map Hu : Wb,i —> S 2 given 
by Hm(u,v) = (x,y) where T(x,y,i) = (u,v,b,i). Say that G satisfies the second 
regularity condition if H bi can be extended to a bijection of open sets with continuous 
partial derivatives a.e. with respect to Lebesgue measure. For the remainder of this 
paper, we will denote this extension by H^. 

Assuming the two regularity conditions hold, define, for all b, % G N 2 such that 
(u, v, b, i) G T(V) for some u, v G S 2 , Ju(u,v) to be the Jacobian of the change of 
variables H^u, v) = (x, y). Note that these regularity conditions make the continuous 
comparison theorem substantially harder to use; they mean that a small change in an 
edge must result in only a small change of the path between its endpoints. Despite 
this, continuous versions of Theorems [5] and [6] are still easy. 
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Theorem 12 (Comparison for Chains on Continuous State Spaces). Under the con- 
ditions described above, for all e6R, 

((i-K)T,f)?<MV-Q)f,f)» 

where 

A e = essup( UtV)( zE{(qx(y)r(u)y^ 2e) 

x W^,y\\^(y)p( x )\ J ^y( u ^ v )\} 

v-(l-2e) 



+ essup M€E {(q x (z)T(u)Y 



- E ,,,„// 

x p(a)|J ab (u,f)|} 
Proof. Start by writing 

((/ - Jf)/, /)„ = \ [ [ (f{x) - f(y)) 2 p(x)k x (y)dxdy 

/SxS 




(f(x) - f{y)fp{x)k x {y)dxdy 



SxS 



+ / / (f(x)-f(y)) 2 p(x)k x (y)dxdy 

J JSxS\S 

+ 1 [ f (/(*) - f(y)) 2 p(x)k x (y)dxdy 

A I J(S\S)x(S\S) 



= -xRi + R2 + 

The goal is to compare this to ((I-Q)f, /)„ = §/ j SxS {f{x)- f{y)) 2 T{x)q x {y)dxdy. 
Ri is bounded exactly as in Theorem 3.2 of [23] : 

J-Ri = o / / ~ f(y)) 2 p( x ) k x(y)dxdy 

A A J JSxS 

(21) < esswp (u ^ )ei; {(g x (y)T(w))~ (1 ~ 2e) 

x llT*,vlle^(3/M»)l^etf(tt,«)|}((^ - Q)/, /% 
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Next, we bound R 2 by writing 




R2 = / / (/(*) - f(y)Yp(x)k x (y)dxdy 

>SxS\S 




(f( x ) - / r y( z )f( z ) dz ) p(x)k x (y)dxdy 
Sxs\s Jzes 




r y (z)(f(x) - f(z))dz) p(x)k x (y)dxdy 

SxS\S JzGS 




-I / ~ /(*)) r y (z)dzp(x)k x (y)dxdy 

'SxS\S JzeS 




(f(x) - f(z)) p(x)( r y (z)k x (y)dy)dxdz 

SxS JS\S 



This last term is bounded exactly as in Theorem 3.2 of |23], with k x (z) replaced 

R 2 < essup M( z E {(q x (z)T(u)y {1 ~ 2e) 
(22) x ll7^l|e^ 5 r y (^)A; :E ( 2/ )^p(x)|J X2 Kt;)|}((/-g)/,/), 



Finally, we bound R 3 by writing 




R 3 = I ^ (f(x) - f(y)) p(x)k x (y)dxdy 

>(S\S)x(S\S) 

f(a)r x (a)da - / f(b)r y (b)db p(x)k x (y)dxdy 

'{S\S)x(S\S) \JaeS JbeS J 

(/(a) - f(b))r Xi y(a,b)dadb) p(x)k x (y)dxdy 

(S\S)x(S\S) \Ja,beS J 






< I / (f(a) - f(b)) r x , y (a,b)dadbp(x)k x (y)dxdy 

'(S\S)x(S\S) Ja,bes 




(f(x)-f(y)) / / „ r xy (a,b)p(x)k x (y)dxdy ) dadb 

SxS \J J(S\S)x(S\S) 



This last term is bounded exactly as in Theorem 3.2 of 23 , with p(a)k a (b) replaced 
b y (/ I(s\s)x(s\s) r ^( a ' b)p{x)k x {y)dxdy): 
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(23) 

R 3 < essMp (Ui „ )eE {(g a (6)r(n)) _(1_2e) V] | |T<x,fe| U( / / r xy (a,b)p{x)k x (y)dxdy) 
xp(a)\J ab (u,v)\}((I-Q)f,f) u 



The theorem follows by combining inequalities (21), (22) and (23) 



9. Applications to the Spectral Profile 



In this section, we prove Theorem |9] using the techniques found in [11]. First, we 
recall the notation in that paper. For S C ft, let cq(S) = {/:/> 0, sup(/) c5,// 
const}. Then define, for kernel Q with stationary distribution u, 

X(S)= inf 

And let i/ min = min wg n z/(w). Then define the spectral profile associated with Q by: 

A(r) = inf A(S) 

lSmin<v(S)<r 

Define the spectral profile A associated with K analogously. The main use of this 
definition in this context is through the following immediate consequence of Corollary 
2-1 of |TT): 

Theorem 13 (Spectral Profile Bound). Fix e > and let X t be a ^-lazy, reversible 
chain with associated spectral profile A. Then for t > ^xtyjdr, 

\\C(X t ) - u\\ TV < e 

We will use this bound with the following lemma: 

Lemma 14 (Comparison for Spectral Profile). Let M be a matrix with nonnegative 
entries such that Mf e M. n is an extension of f for all f G M. Q . Assume that 

S Q (f,f)>AS K (Mf,Mf) 

Furthermore, set C\ = sup xgn jjA. Finally, for S C £1, define S C to be the support 
of Ml s and C 2 = sup 5cX! Then 
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Proof. For all / e c (S), 



X(S)> 



> 



gq(/,/) 

Vu(f) 

AS K (Mf,Mf) 



CiV^Mf) 

But by assumption, the support of Mf is contained in S. Thus, 



X(S) > £-\(S) 



The result follows immediately. 



We will now use this lemma along with Theorems [13] and [5] to prove Theorem [9] 
The distributions and flow will be as in the proof of Theorem |5j it is easy to check 
that, in the notation of Lemma 14, < 4 in this example. Thus, the only missing 
ingredient is a bound on the spectral profile A associated with simple random walk on 
the torus. By remark 6 following Theorem 1.2 of (6], the simple random walk on the 
torus has a property known as (|, 2) moderate growth (see that paper for a definition 
of moderate growth). By Lemma 5.3 of [7j, this walk satisfies what is known as a local 
Poincare inequality, with constant 8 (again, see that paper for a definition of local 
Poincare inequality) . We don't use these two properties directly, but combining them 
with the inequality following equation 4.3 of 11 , the spectral profile of the random 
walk on the torus satisfies the inequality 

A(r) > ! Jr- - 1^ 1 



27r 



2n 2 



Thus, by Lemma [14] and the comments immediately following it, along with Theorem 
[5j the spectral profile of the walk on the torus with holes satisfies 

A(r) 



> 



An 2 \27r 



- 1 



Theorem M follows immediately from this bound and Theorem 13 
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